Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFTimeIndex calendar in repr #4092

Merged
merged 37 commits into from
Jul 23, 2020
Merged

CFTimeIndex calendar in repr #4092

merged 37 commits into from
Jul 23, 2020

Conversation

aaronspring
Copy link
Contributor

@aaronspring aaronspring commented May 25, 2020

Done:

  • added calendar property to CFTimeIndex
  • rebuild repr from pandas

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aaronspring, sorry for taking a while to get back to you. These are good questions.

inheritance is disencouraged, how should I extend the __repr__ coming from pd.Index? Should I try to rebuild pd.Index?

I'll admit, I'm not an expert in writing array-like reprs. I agree though that we should do what we can to avoid relying on private pandas API. Have you looked into what this would entail? Is there a big web of internal methods we'd have to copy over?

how to deal with calendar=365(6)_day which get internally converted into noleap/all_leap?

I think I am ok with this. My sense is that it is more important that indexes with the same date type have the same calendar attribute than that the calendar argument passed to cftime_range is propagated verbatim to the index it produces. What do you think?

xarray/tests/test_cftimeindex.py Outdated Show resolved Hide resolved
@dcherian
Copy link
Contributor

dcherian commented Jun 2, 2020

how should I extend the __repr__ coming from pd.Index

if you define a __repr__ method under CFTimeIndex, won't it override pd.Index.__repr__?

@aaronspring
Copy link
Contributor Author

how should I extend the __repr__ coming from pd.Index

if you define a __repr__ method under CFTimeIndex, won't it override pd.Index.__repr__?

Yes. I will do that. My question was whether should replicate the pd.Index.repr or try to import or inherit as much as I can from pandas?

@dcherian
Copy link
Contributor

dcherian commented Jun 2, 2020

Ah now I understand your questions. Thanks for clarifying.

My question was whether should replicate the pd.Index.repr or try to import or inherit as much as I can from pandas?

Maybe start by making the smallest possible change to the pandas repr?

EDIT: If we want to build our own repr, there are some helpful functions in core/formatting.py

@aaronspring
Copy link
Contributor Author

I was hoping to inherit from pandas like:

    def __repr__(self):
        super().__repr__()
        return self.__repr__().strip(")")+f", calendar={self.calendar}')"

But I will now try to rebuild as in pandas.

@aaronspring aaronspring changed the title WIP: CFTimeIndex calendar in repr CFTimeIndex calendar in repr Jun 3, 2020
@pep8speaks
Copy link

pep8speaks commented Jun 3, 2020

Hello @aaronspring! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-19 17:01:25 UTC

@aaronspring
Copy link
Contributor Author

This doesnt change the html repr yet: when clicking on the data symbol, calendar is not shown in the printout. I will further try to fix this

@aaronspring
Copy link
Contributor Author

I do understand the meaning of this last sentence @spencerkclark

My sense is that it is more important that indexes with the same date type have the same calendar attribute than that the calendar argument passed to cftime_range is propagated verbatim to the index it produces. What do you think?

Should I rather test on cftimeindex created in a different way?

@aaronspring
Copy link
Contributor Author

Currently only the cftimeindex repr shows the calendar property. I aim to get it into the dataset/dataarray repr.

would it be a good idea to modify formatting.py.array_repr(array)? IndexVariable uses this repr. Once time is a coordinate, it becomes an IndexVariable. but the index variable doesnt have this cftime.calender property anymore. Should I try to add this calendar property to the array or would this not be desired?

@aaronspring aaronspring changed the title CFTimeIndex calendar in repr [WIP]: CFTimeIndex calendar in repr Jun 6, 2020
@aaronspring
Copy link
Contributor Author

aaronspring commented Jun 6, 2020

I found a workaround with to_index() in formatting.short_data_repr. this is makes one test fail: xarray/tests/test_dask.py TestDataArrayAndDataset.test_dataarray_repr. is this ok or not a valid way to implement a cftimeindex.repr?

EDIT: I ensure now that to_index is only done when if CFTimeIndex.

Ready for review.

@aaronspring aaronspring changed the title [WIP]: CFTimeIndex calendar in repr CFTimeIndex calendar in repr Jun 6, 2020
@aaronspring aaronspring changed the title CFTimeIndex calendar in repr CFTimeIndex calendar in repr and coords repr from to_index Jun 6, 2020
@aaronspring aaronspring changed the title CFTimeIndex calendar in repr and coords repr from to_index WIP: CFTimeIndex calendar in repr Jun 9, 2020
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping @aaronspring -- my apologies for being slow to respond! I appreciate the effort to go all the way toward reproducing the pandas repr. My remaining concern, and it is a little nit-picky, is that in its current form, there is no way to limit the width of the repr, e.g. with xarray.set_options(display_width=40) (see below for example for an existing xarray repr). Do you think you might be able to enable that?

In [1]: import xarray as xr
In [2]: ds = xr.tutorial.open_dataset("rasm")

In [3]: ds
Out[3]:
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 ...
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue Dec 27 14:15:22 2016: ncatted -a dimension...

In [4]: xr.set_options(display_width=40)
Out[4]: <xarray.core.options.set_options at 0x7fbdc834a668>

In [5]: ds
Out[5]:
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 1980-09-16...
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 ...
Attributes:
    title:                     /works...
    institution:               U.W.
    source:                    RACM R...
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based ...
    comment:                   Output...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue De...

len_item = 19 # length of one item in repr
# shorten repr for more than 100 items
max_width = (19 + 1) * 100 if len(self) <= 100 else 22 * len_item
datastr = format_array_flat(self.values, max_width)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think format_array_flat is a good choice if we want a 2-line repr, but for including more values it might be cleaner to write our own logic, rather than add commas and line breaks afterwards. As you've picked up on, I think the fact that we can treat the length of each element of the repr in a cftime array as a constant simplifies things greatly.

This is of course ignoring the potential for five-digit years; however, we already assume we won't see those in at least one other place in xarray (partial string indexing). At some point it might be good address that, but I think for now it's ok to stick with the four-digit year assumption. Particularly here, I think the worst that would happen is that the repr might potentially be a few characters wider than the imposed limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for now I use format_array_flat and insert commata and linebreaks manually. Are you suggestion I should rather write a new function? This function would probably use much of the code of format_array_flat.

concerning the 5digit years: xr.cftime_range(start='10000',periods=2) fails with ValueError: no ISO-8601 match for string: 10000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now wrote a new function format_cftimeindex_array like format_array_flat. Hope this is what you were hoping to see. Both functions share much of the code especially in the beginning of the function. Should I refactor these shared code parts into a small function that both functions use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concerning the 5digit years: xr.cftime_range(start='10000',periods=2) fails with ValueError: no ISO-8601 match for string: 10000

Oh right, we use the same string parsing logic in cftime_range as in partial datetime string indexing. I was thinking of dates one might read in from a file, which get decoded through cftime.num2date. Anyway I acknowledge that is an issue we don't need to address at the moment!

I now wrote a new function format_cftimeindex_array like format_array_flat. Hope this is what you were hoping to see.

Sorry I was thinking something more along these lines for the code that formats the times (the rest of the repr can be added around what it generates):

CFTIME_REPR_LENGTH = 19

def format_row(times, indent=0, separator=", ", row_end=",\n"):
    return indent * " " + separator.join(map(str, times)) + row_end


def format_times(
    index,
    max_width,
    offset,
    separator=", ",
    first_row_offset=0,
    intermediate_row_end=",\n",
    last_row_end=""
):
    n_per_row = max(max_width // (CFTIME_REPR_LENGTH + len(separator)), 1)
    n_rows = int(np.ceil(len(index) / n_per_row))
    
    representation = ""
    for row in range(n_rows):
        indent = first_row_offset if row == 0 else offset
        row_end = last_row_end if row == n_rows - 1 else intermediate_row_end  
        times_for_row = index[row * n_per_row:(row + 1) * n_per_row]
        representation = representation + format_row(
            times_for_row,
            indent=indent,
            separator=separator,
            row_end=row_end
        )
        
    return representation

In other words iteratively generating each row in the repr, inserting the separator as you build each row, and inserting line breaks at the end of each row. I just find it fits in my head better than adding those elements post-hoc. I think you should be able to leverage the code above to construct a "split" repr as well (e.g. one that shows only the first and last 10 elements of the index) by calling format_times twice with the appropriate arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. Now I think I get the idea...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the really nice template

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented thanks to your nice template given above. ready for review @spencerkclark

@aaronspring
Copy link
Contributor Author

now aligns to display_width:

for dw in [40,60,80,120]:
    with xr.set_options(display_width=dw):
        print(time[:2],dw,'\n')
CFTimeIndex([2000-01-01 00:00:00,
             2000-01-02 00:00:00],
            dtype='object',
            length=2,
            calendar='gregorian') 40 

CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00],
            dtype='object', length=2, calendar='gregorian') 60 

CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00],
            dtype='object', length=2, calendar='gregorian') 80 

CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00], dtype='object', length=2, calendar='gregorian') 120

@aaronspring
Copy link
Contributor Author

dont understand why isort fails

@keewis
Copy link
Collaborator

keewis commented Jul 9, 2020

that's #4204. #4206 pinned the version of isort, so you should be able to get a green CI by merging master into your feature branch.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aaronspring; this is looking very close! Just a few more minor suggestions.

xarray/core/formatting.py Outdated Show resolved Hide resolved
xarray/coding/cftimeindex.py Outdated Show resolved Hide resolved
xarray/tests/test_cftimeindex.py Outdated Show resolved Hide resolved
xarray/tests/test_cftimeindex.py Outdated Show resolved Hide resolved
xarray/coding/cftimeindex.py Outdated Show resolved Hide resolved
@aaronspring
Copy link
Contributor Author

Thanks @aaronspring; this is looking very close! Just a few more minor suggestions.

I hope this is the final one. all tests pass. implemented your suggestions. took me a few commits, but I learned a lot. thanks for the guidance @spencerkclark

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aaronspring! I pushed a few more minor edits, mostly to spruce up the documentation a bit. Otherwise this looks good to me. Barring any comments from others I'll merge it sometime next week.

doc/whats-new.rst Outdated Show resolved Hide resolved
Co-authored-by: keewis <keewis@users.noreply.github.com>
@spencerkclark spencerkclark merged commit 0b706a4 into pydata:master Jul 23, 2020
@spencerkclark
Copy link
Member

Thanks again @aaronspring!

@dcherian
Copy link
Contributor

Thanks @aaronspring this is a great contribution.

@aaronspring
Copy link
Contributor Author

aaronspring commented Jul 23, 2020

My pleasure. Learnt a lot.

And it was a long-standing issue that was mentioned in a medium article about xarray looking for more contributors.

dcherian added a commit to jacobtomlinson/xarray that referenced this pull request Jul 24, 2020
* upstream/master:
  Added xarrays-spatial and updated geoviews link (pydata#4262)
  update docs to point to xarray-contrib and xarray-tutorial (pydata#4252)
  Add release summary, some touch-ups (pydata#4217)
  CFTimeIndex calendar in repr (pydata#4092)
  fix the RTD timeouts (pydata#4254)
  update isort CI and pre-commit hook (pydata#4204)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indicate calendar type in CFTimeIndex repr
5 participants